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Multivariate data analysis: 
The French way 

Susan Holmes*^ 

Stanford University 

Abstract: This paper presents exploratory techniques for multivariate data, 
many of them well known to French statisticians and ecologists, but few well 
understood in North American culture. We present the general framework 
of duality diagrams which encompasses discriminant analysis, correspondence 
analysis and principal components, and we show how this framework can be 
generalized to the regression of graphs on covariates. 



1. Motivation 

David Freedman is well known for his interest in multivariate projections [5] and 
his skepticism with regards to model-based multivariate inference, in particular in 
cases where the number of variables and observations are of the same order (see 
Freedman and Peters [12, 13]). 

Brought up in a completely foreign culture. I would like to share an alien ap- 
proach to some modern multivariate statistics that is not well known in North 
American statistical culture. I have written the paper 'the French way ' with theo- 
rems and abstract formulation in the beginning and examples in the latter sections; 
Americans are welcome to skip ahead to the motivating examples. 

Some French statisticians, fed Bourbakist mathematics and category theory in 
the 60's and 70's as all mathematicians were in France at the time, suffered from 
abstraction envy. Having completely rejected the probabilistic enterprise as use- 
less for practical reasons, they composed their own abstract framework for talking 
about data in a geometrical context. I will explain the framework known as the 
duality diagram developed by Gazes, Cailliez, Pages, Escoufier and their follow- 
ers. I will try to show how aspects of the general framework are still useful today 
and how much every idea from Benzecri's correspondence analysis to Escoufier's 
conjoint analysis has been rediscovered many times. Section 2.1 sets out the ab- 
stract picture. Sections 2.2-2.6 treat extensions of classical multivariate techniques: 
principal components analysis, instrumental variables, canonical correlation analy- 
sis, discriminant analysis, correspondence analysis from this unified view. Section 
3 shows how the methods apply to the analysis of network data. 
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2. The duality diagram 

Established by the French school of "Analyse des Donnees" in the early 1970's, 
this approach was only published in a few texts [1] and technical reports [9], none 
of which were translated into English. My Ph.D. advisor, Yves Escoufier [8, 10] 
publicized the method to biologists and ecologists, presenting a formulation based 
on his RV-coefficient that I will develop below. The first software implementation of 
duality based methods described here were done in LEAS (1984), a Pascal program 
written for Apple II computers. The most recent implementation is the R package 
ade-4 (see Appendix A for a review of various implementations of the methods 
described here). 



2.1. Notation 



The data are p variables measured on n observations. They are recorded in a matrix 
X with n rows (the observations) and p columns (the variables). -D„ is an n x n 
matrix of weights on the "observations" , which is most often diagonal. We will also 
use a " neighborhood" relation (thought of as a metric on the observations) defined 
by taking a symmetric definite positive matrix Q. For example, to standardize the 
variables Q can be chosen as 
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These three matrices form the essential "triple" {X,Q,D) defining a multivariate 
data analysis. As the approach here is geometrical, it is important to see that Q 
and D define geometries or inner products in Rp and R", respectively, through 



x^Qy 



-< x,y >Q, 



V Dw =< l\W >_D, 



x,y £ 
x,y £ 



From these definitions we see there is a close relation between this approach and 
kernel based methods, for more details see [24]. Q can be seen as a linear function 
from MP to MP* = £{M.p), the space of scalar linear functions on R^. D can be seen 
as a linear function from i?" to R"* = £(R"). Escoufier[8] proposed to associate to 
a data set an operator from the space of observations R^ into the dual of the space 
of variables R"*. This is summarized in the following diagram [1] which is made 
commutative by defining V and W as X^DX and XQX* respectively, (commutative 
just says that VQ = X^DXQ and WD = XQX^D). 

We call VQ the characterizing operator of the diagram. 

TOP* , TO" 



X 



Q 



V 



D 



W 



X^ 
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This is known as the duaUty diagram because knowledge of the eigendecompo- 
sition of X^DXQ — VQ leads to that of the dual operator XQX^D. The main 
consequence is an easy transition between principal components and principal axes 
as we will see in the next section. The terms duality diagram or triple are often 
used interchangeably. 

Remarks. 

1. The duality diagram is equivalent to a triple of three matrices (X, Q, D) such 
that X isnxp and Q and D are symmetric matrices of the right size (Q is p x p 
and D is nx n). The operators defined as XQX*D = WD and X*DXQ = VQ 
are called the characteristic operators of the diagram [8] . We say an operator 
O is B-symmetric if < x,Oy >b=< Ox,y >b, or equivalently BO ~ 0*B. 
In particular, VQ is Q-symmetric and WD is Z?-symmetric. 

2. V — X*DX will be the variance-covariance matrix if X is centered with 
regards to D {X' Din = 0) and D is the diagonal matrix with all elements 
equal to i. 

3. There is an important symmetry between the rows and columns of X in the 
diagram, and one can imagine situations where the role of observation or 
variable is not uniquely defined. For instance in microarray studies the genes 
can be considered either as variables or observations. This makes sense in 
many contemporary situations which evade the more classical notion of n 
observations seen as a random sample of a population. It is certainly not the 
case that the 30,000 probes are a sample of genes since these probes try to be 
an exhaustive set. 

2.1.1. Properties of the diagram 

Here are some of the properties that prove useful in various settings: 

• Rank of the diagram: X,X*,VQ and WD all have the same rank r, which 
will usually be smaller than both n and p. 

• For Q and D symmetric matrices, VQ and WD are diagonalisable and have 
the same eigenvalues. We denote them in decreasing order 

Ai > A2 > A3 > • • • > > = • • • = 0. 

• Eigendecomposition of the diagram: VQ is Q symmetric, thus we can find Z 
such that 

(2.1) VQZ = ZA, Z^QZ ^ Ip, 

where 

A = diag{Xi, A2, . . . , A^, 0, . . . , 0) and Ip is the identity matrix in R^. 

This generalized eigendecomposition of VQ is often called the (generalized) 
PCA of the triple (X, Q, D). 

In practical computations, wc start by finding the Cholesky decompositions 
of Q and D, which exist as long as these matrices are symmetric and positive 
definite; call these H^H = Q and K'^K = D. Here H and K are upper 
triangular. Then wc can use the singular value decomposition of KXH^: 



KXH* = UST\ with T*T = Ip, U*U ^ I„, S diagonal, 
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to give US 

X ^ K-^UST\H^)-^ = K-^UST\H-y and ^ H-^TSU\K^)-^ . 
Thus 

HX^DXH^ = TS^T* = TAT* with A = 
and finaUy wc can sec that Z = H^^T satisfies (2.1). 

The renormahzed columns of Z, A ~ SZ are called the principal axes and 
satisiy: 

A'QA = A. 

Similarly, wc can define L = K~^U that satisfies 

(2.2) WDL = LA,L^DL = ln, where A = dw5(Ai, As, . . . , A^, 0, . . . , 0). 

C = LS is usually called the matrix of principal components. It is normed so 
that 

C'DC = A. 

When wc impose that C or Z be of reduced rank q < min(n, p) , wc take just 
their first q columns, and have thus achieved what is known as the generalized 
PCA of rank q. 

• Transition Formula;: Of the four matrices Z, A, L and C we only have to 
compute one, all others are obtained by the transition formula: provided by 
the duality property of the diagram: 

XQZ = LS ^C, X^DL ^ ZS = A. 

• The Trace{VQ) = TraceiW D) is often called the inertia of the diagram (in- 
ertia in the sense of Huyghens' inertia formula for instance). The inertia with 
regards to a point A of a cloud of pi-weighted points being Y^^=i Pi(P{xi,a). 
When we look at ordinary PCA with Q = Ip, D = -In, and the variables 
are centered, the inertia is the sum of the variances of all the variables. If 
the variables are standardized {Q is the diagonal matrix of inverse variances), 
then the inertia is the number of variables p. 



2.2. Comparing two diagrams: the RV coefficient 

Many problems can be rephrased in terms of comparison of two "duality diagrams" 
or put more simply, two characterizing operators, built from two "triples", usually 
with one of the triples being a response or having constraints imposed on it. We 
usually try to make one triple match the other in some optimal way. 

To compare two symmetric operators, there is either a vector covariance 
covV{Oi,02) = Tr{0l02) or a vector correlation [s] 



^Tr(0l0i)tr(0i02) 

If we were in the special case of comparing two variables X and Y then the 
computation of the RV coefficient comparing the two triples (X„xi,l, ^^n) and 
(y„xi, 1, j^In) would give the square of the correlation between the variables RV — 
. Thus we see that in general the RV coefficient is an extension of the notion of 
correlation to the multivariate context. 
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Generalized PCA of rank q oi a. D centered matrix X as defined above can be 
seen as providing best approximation F in the RV-sense. To be more precise, we 
are looking for the matrix F of rank q which once inserted in a triple with the same 
weights on the observations D and no weighting of the variables will maximizes the 
RV coefficient between characterizing operators. Thus F is the choice of matrix of 
rank q < p that maximizes 

RV{XQX^D,FF^D)^ Tr (XQX^DFF^D) 



Tr {XQXWf Tr {FFWf 



This maximum is attained where F is chosen as the matrix combining the first 
q eigenvectors of XQX*D normed so that F^DF = Aq, the diagonal matrix where 
only the first q eigenvalues are non zero. The maximum RV is 



RVmax 



Of course, classical PCA has D — -I, Q — I, but the extra flexibility is often 
useful. We define the distance between triplets {X, Q, D) and (Z, P, D) where Z is 
also ri X p, as the distance deduced from the RV inner product between operators 
XQX^D and ZPZ^D. In fact, the reason the French like this scheme so much is 
that most multivariate linear methods can be reframed in these terms. We will give 
a few examples such as Principal Component Analysis (PCA in English, ACP in 
French), Correspondence Analysis (CA in English, AFC in French), Discriminant 
Analysis (LDA in English, AFD in French), PCA with regards to instrumental 
variable (PCAIV in English, ACPVI in French) and Canonical Correlation Analysis 
(CCA in English, AC in French). 



2.3. Explaining one diagram by another 

Principal Component Analysis with respect to Instrumental Variables was a tech- 
nique developed by C. R. Rao [25] to find the best set of coefficients in the multi- 
variate regression setting where the response is multivariate, given by a matrix Y . 
In terms of diagrams and RV coefficients, this problem can be rephrased as that of 
finding M to associate to X so that {X, M, D) is as close as possible to (Y, Q, D) 
in the RV sense. 

The answer is provided by defining M such that 

YQY^D = XXMX^D. 

If this is possible then the two eigendecompositions of the triple give the same 
answers. We simplify notation by the following abbreviations: 

X'DX = S,^, Y'DY = Syy, X'DY = S^y 

and 

^ = ^xx ^xyQSyxS^j. . 

Then 



II YQY^D - XMX*D f=\\ YQY*D - XRX*D f + || XRX*D - XMX*D f 
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The first term on the right hand side does not depend on Af , and the second term 
will be zero for the choice M = R. 

If we add the extra constraint that wc only allow ourselves a rank q approxima- 
tion, with q < min (rank (X), rank (Y)), the optimal choice of a positive definite 
matrix M is to take M = RBB^R where the columns of B are the eigenvectors of 
X*DXR with: 



B 



1 



Pi,- 



1 



zPq such that 



X^DXRPk 



,q, 



Ai > A2 > 



> Xq. 



The PCA with regards to instrumental variables of rank q is equivalent to the PCA 
of rank q of the triple (X, R, D) where 



2.4- One diagram to replace two diagrams 



Canonical correlation analysis was introduced by HotcUing [1^] to find the common 
structure in two sets of variables Xi and X2 measured on the same observations. 
This is equivalent to merging the two matrices columnwise to form a large matrix 
with n rows and pi + p2 columns and taking as the weighting of the variables the 
matrix defined by the two diagonal blocks (XlDXi)^^ and (X|L)X2)^^ 



/ (XiDX,)-^ 



Q = 



Wi 



\ 



{xiDx^r' ) 



JP2» 



X2 



W2 



Q 



01+P2* 



lXi;X2] 

^1 



W 



[lXi,X2V 

This analysis gives the same eigenvectors as the analysis of the triple {X2DX1, 
{XlDXi)~^ , {X2DX2)~'^), also known as the canonical correlation analysis of Xi 
and X2. These eigenvectors are known as the canonical variables. 



2.5. Discriminant analysis 

If we want to find linear combinations of the original variables Xnxp that char- 
acterize best the group structure of the points given by a zero/one group coding 
matrix Y, with as many columns as groups (call this number g), we can phrase the 
problem as a duality diagram. Suppose that the observations are given individual 
weights in the diagonal matrix D, and that the variables are centered with regards 
to these weights. 
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Let A be the gxp matrix of group means in each of the p variables. This satisfies 



Y^DX = Ay A where, 



Ay = Y^DY = diag{wi,W2, 



,Wg), Wk= ^ di 



The Wfc's are the group weights, as they are the sums of the weights as defined by 
D for all the elements in that group. Call T the matrix T ~ X^DX, in the standard 
case with all diagonal elements of D equal to ^ this is just the standard variance- 
covariance, otherwise it is a generalization thereof. The generalized between group 
variance-covariance is -B = A* Ay A and call the between group variance covariance 
the matrix W ^ {X - YAyD{X - YA). 

Proposition 1. (A generalized Huyghens' formula). 

B + W. 

Proof. Expanding W gives 
W 



X^DX - X*DYA - A^Y^DX + A^Y^DYA 
= T- A' Ay A - A' Ay A + A' Ay A 
= T -B. 

The duality diagram for linear discriminant analysis is 



□ 



> 

A 

B Ay 
< 



AT-^A' 



This corresponds to the triple {A,T ^, Ay), because 

{X*DY)Ay^{Y*DX) A*AyA 

and gives equivalent results to the triple {Y*DX,T~^ , Ay^). 

The discriminating variables are the eigenvectors of the operator 

A'AyAT-\ 

They can also be seen as the PCA with regards to instrumental variables of 
(y, Ay\D) with regards to {X, M, D). 



2.6. Correspondence analysis 

Correspondence analysis can be used to analyse several types of multivariate data. 
All involve some categorical variables. Here are some examples of the type of data 
that can be decomposed using this method: 

• Contingency Tables (cross-tabulation of two categorical variables). 

• Multiple Contingency Tables (cross-tabulation of several categorical vari- 
ables). 
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• Binary tables obtained by cutting continuous variables into classes and then 
receding both these variables and any extra categorical variables into 0/1 
tables, 1 indicating presence in that class. So for instance a continuous variable 
cut into three classes will provide three new binary variables of which only 
one can take the value one for any given observation. 

To first approximation, correspondence analysis can be understood as an extension 
of principal components analysis (PCA) where the variance in PCA is replaced 
by an inertia proportional to the distance of the table from independence. CA 
decomposes this measure of departure from independence along axes that are or- 
thogonal according to a inner product. If we are comparing two categorical 
variables, the simplest possible model is that of independence in which case the 
counts in the table would obey approximately the margin products identity. For an 
m X p contingency table N with n ~ X^i^li "-u observations and associated 

to the frequency matrix F =^ ^. Under independence, the approximation 

Hi. n.j 

riij = -n 

n n 



can also be written: ^ cr*n where 



r = -A^l, 



is the vector of row sums of F and c* = -N'1„^ are the column sums. The departure 
from independence is measured by the statistic 



/ Uj.n.j N 2 

2 _ Y^rl"-U ;72~"-J 



■y 

/ A rii.n.j J- 

Under the usual validity assumptions that the cell counts riy are not too small, this 
statistic is distributed as a x^ with (m — 1) degrees of freedom if the data are 

independent. If we do not reject independence, there is no more to be said about 
the table, no interaction of interest to analyse. There is in fact no 'multivariate' 
effect. 

On the contrary if this statistic is large, we decompose it into one dimensional 
components. 

Correspondence analysis is equivalent to the eigendecomposition of the triple 
(X, Q, D) with 

Dc = diag(c), Dr = diag(r), X'Drlm = Ip, the average of each column is one. 
Notes: 

1. Consider the matrix Dr~^FDc~^ and take the principal components with 
regards to the weights D,. for the rows and for the columns. 
The recentered matrix Dj-^^FD^^^ — lj„lp has a generalized singular value 
decomposition 

D.-^FD-^ - l',„lp = USV, with U'DrU = I,n, V'DcV = Ip 
having total inertia: 

Dr{Dr-^FD-^ - l'^lj,)'D,{Dr-^FD-^ - i:„lp) = ^. 
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2. PCA of the row profiles FDr , taken with weight matrix Dc and the metric 

3. Notice that 

and the row and columns profiles are centered 

This method has been rediscovered many times, the most recently by Jon Klein- 
berg's in his method for analyzing Hubs and Authorities [19]. See Fouss, Saerens 
and Renders [11] for a detailed comparison. 

In statistics the most commonplace use of Correspondence Analysis is in ordi- 
nation or scriation. that is . the search for a hidden gradient in contingency tables. 
As an example we take data analyzed by Cox and Brandwood [4] and Diaconis [d], 
who wanted to seriate Plato's works using the proportion of sentence endings in a 
given book with a given stress pattern. The seven books studied here are Republic, 
Laws, Critias, Philebus, Sophist, Timoeus. We use abbreviations of these names as 
our column labels in the data analysis below. The stress patterns use the last five 
syllables of every sentence and combine long or short syllables (abbreviated by - 
and U in the data below). Thus there are 32 possible stress patterns, and 32 rows 
in our contingency table. 

We propose the use of correspondence analysis on the table of frequencies of 
sentence endings, for a detailed analysis see Charnomordic and Holmes [2]. 

The first 10 row profiles (as percentages) are as follows: 
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etc (there are 32 rows in all) 



The eigenvalue decomposition (called the scree plot) of the chi-square distance 
matrix (see [2]) shows that two axes out of a possible 6 (the matrix is of rank 6) 
will provide a summary of 85% of the departure from independence. This suggests 
that a planar representation will provide a good visual summary of the data. 
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Fig 1. Correspondence analysis of Plato's works. 



We can see from the plot that there is a seriation that in most cases follows a 
parabola or arch [ Id] from Laws on one extreme being the latest work and Republica 
being the earliest among those studied. 



3. Prom discriminant analysis to networks 



Consider a graph with vertices the members of a social group and edges if two 
members interact. We suppose each vertex comes with an observation vector Xi, 
and that each has the same weight -. In the extreme case of discriminant analysis, 
the graph is supposed to connect all the points of a group in a complete graph, and 
be disconnected between observations from different groups. Discriminant Analysis 
is just the explanation of this particular graph by linear combinations of variables. 
What we propose here is to extend this to more general graphs in a similar way. 
We will suppose all the observations are the nodes of the graph and each has the 
same weight The basic decomposition of the variance is written 

COV {Xj.Xk) = tjk 

Call the group means. 



1 " 



T ■ ■ ■ Til 



1 ^ 



The French way 



229 



As in proposition 1, Huyghens' formula is tjk = Wjk + bjk, where 

q 

3=1 leGg 
q 7j 

9=1 

T = W + B. 

As we showed above, hnear discriminant analysis finds the linear combinations a 
such that is maximized. This is equivalent to maximizing the quadratic form 
a^Ba in a, subject to the constraint a^Ta = 1. As we saw above, the eigenvalue 
problem 

Ba = XTa or T-^Ba = Xa if T^^ exists. 

provides A as needed. Then a'Ba = Xa'Ta — X. Wc extend this to graphs by relaxing 
the group definition to partition the variation into local and global components. 



3.1. Decomposing the variance into local and global components 

Lebart was a pioneer in adapting the eigenvector decompositions to cater to spatial 
structure in the data [20, 2f , 22]. We can again decompose the variance into parts, 
but this time the criteria for the decomposition is not defined by group membership 
as in LDA but by the neighborhood relation given by the spatial structure. We call 
the set of edges of the undirected neighborhood graph E. The usual elementwise 
definition of covariances is given by 

n n 



COV {Xj,Xk) = -^i^i] -Xj){Xik -Xk) = -^^^'^i^iO - ^i' j){Xik -Xi'k)- 
2=1 i—1 i' — 1 

For the variances we have 

[(i,i')e-E {i,i')^E 

If we call M the incidence matrix of the graph = 1 G E. The degree of 

vertex i is rrii = X^I'^i '^m'- We take the convention that there are no self loops. 
Then another way of writing the variance formula is 

^ I n n 

vai"(a;j) = ^ < X X ~ ^''j)^ + X (^'J ~ 

Call local variance 

-j^ n n 

^farloc{,Xj) = ^ X X ~ ^i'o)"^ 

where m — X]"=i X]"=i '^ii' ■ The total variance is the variance of the complete 
graph. Geary's ratio [14] is used to see whether the variable Xj can be considered as 
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independent of the graph structure. If the neighboring values of Xj seem positively 
correlated then the local variance will only be an underestimate of the variance; 

var(xj-) 

Call D the diagonal matrix with the total degrees of each node in the diagonal 
D = diag{mi). 

For all variables taken together, j = 1, . . . ,p note the local covariance matrix 
V = 2^X*(D — M)X, if the graph is just made of disjoint groups of the same 
size. This is proportional to the W within class variance-covariance matrix. The 
proportionality can be accomplished by taking the average of the sum of squares 
to the average of the neighboring nodes [23]. We can generalize the Geary index 
to account for irregular graphs coherently. In this case we weight each node by its 
degree. Then we can write the Geary ratio for any n-vector x as 

\ 




m„/ 

We can ask for the coordinate(s) that are the most correlated to the graph structure, 
then if we want to minimize the Geary ratio, choose x such that c(x) is minimal. 
This is equivalent to minimizing x*{D — M)x under the constraint x^Dx = 1. It 
can be solved by finding the smallest eigenvalue n with eigenvector x such that: 



{D- 


M)x 


= ^Dx, 


d-\d- 


M)x 




(1 


- 


= D~^Mx 



This is exactly the defining equation of the correspondence analysis of the matrix 
M. This can be extended to as many coordinates as wc like, in particular we can 
take the first 2 largest eigenvectors and provide the best planar representation of 
the graph in this way. 

3.2. Regression of graphs on node covariates 

The covariables measured on the nodes can be essential to understanding the fine 
structure of graphs. We call X the nxp matrix of measurements at the vertices of the 
graph; they may be a combination of both categorical variables (gene families, GO 
classes) and continuous measurements (expression scores). We can use the PCAIV 
method defined in Section 2 to the eigenvectors of the graph defined above. This 
provides a method that uses the covariates in X to explain the graph. To be more 
precise, given a graph {V,E) with adjacency matrix Af, define the Laplacian 

L = D^^{AI — /), D = diag{di, d2, . . . , dn) diagonal matrix of degrees. 

Using the eigenanalysis of the graph, we can summarize the graph with a few 
variables, the first few relevant eigenvectors of L, these can then be regressed on 
the covariates using Principal Components with respect to Instrumental Variables 
[25] as defined above to find the linear combination of node covariates that explain 
the graph variables the best. 
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Appendix A: Resources 
A.l. Reading 

There are few references in English explaining the duality/operator point of view, 
apart from the already cited references of Escoufier [8, 10]. Frederique Glagon's 
PhD thesis [15] (in French) clearly lays out the duality principle before going on to 
explain its application to the conjoint analysis of several matrices, or data cubes. 
The interested reader fluent in French could also consult any one of several Masters 
level textbooks on the subject for many details and examples: 

• Brigittc Escofficr and Jerome Pages [7] have a textbook with many examples, 
although their approach is geometric, they do not delve into the Duality 
Diagram, more than explaining on page 100 its use in transition formula 
between eigenbases of the different spaces. 

• [22] is one of the broader books on multivariate analyses, making connections 
between modern uses of eigcndecomposition techniques, clustering and seg- 
mentation. This book is unique in its chapter on stability and validation of 
results (without going as far as speaking of inference). 

• Cailliez and Pages [1] is hard to find, but was the first textbook completely 
based on the diagram approach, as was the case in the earlier literature they 
use transposed matrices. 

A. 2. Software 

The methods described in this article are all available in the form of R packages 
which I recommend. The most complete package is ade4 [:j] which covers almost 
all the problems I mention except that of regressing graphs on covariates. However, 
a complete understanding of the duality diagram terminology and philosophy is 
necessary as these provide the building blocks for all the functions in the form 
of a class called dudi (this actually stands for duality diagram). One of the most 
important features in all the ' dudi . * ' functions is that when the argument scannf 
is at its default value TRUE, the first step imposed on the user is the perusal of the 
scree plot of eigenvalues. This can be very important, as choosing to retain 2 values 
by default before consulting the eigenvalues can lead to the main mistake that can 
be made when using these techniques: the separation of two close eigenvalues. When 
two eigenvalues are close the plane will be stable, but not each individual axis or 
principal component resulting in erroneous results if for instance the 2nd and 3rd 
eigenvalues were very close and the user chose to take 2 axes [17]. 

Another useful addition also comes from the ecological community and is called 
vegan. Here is a list of suggested functions from several packages: 

• Principal Components Analysis (PCA) is available in prcomp and princomp 
in the standard package stats as pea in vegan and as dudi .pea in ade4. 

• Two versions of PCAIV are available, one is called Redundancy Analysis 
(RDA) and is available as rda in vegan and pcaiv in ade4. 

• Correspondence Analysis (CA) is available in cca in vegan and as dudi . coa 
in ade4. 

• Discriminant analysis is available as Ida in stats, as discrimin in ade4 

• Canonical Correlation Analysis is available in cancor in stats (Beware cca 
in ade4 is Canonical Correspondence Analysis). 

• STATIS (Conjoint analysis of several tables) is available in ade4. 
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